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Abstract 

Jp"! The unfolding program TRUEE is a software package for the numerical so- 

lution of inverse problems. The algorithm was first applied in the FOR- 
TRAN 77 program TZUM . 7UAM is an event-based unfolding algorithm which 
makes use of the Tikhonov regularization. It has been tested and compared 
to different unfolding applications and stood out with notably stable results 
and reliable error estimation. TRUEE is a conversion of TUAM to C++, 
which works within the powerful ROOT framework. The program has been 
extended for more user-friendliness and delivers unfolding results which are 
identical to TZUAf. Beside the simplicity of the installation of the software 
and the generation of graphics, there are new functions, which facilitate the 
choice of unfolding parameters and observables for the user. 

In this paper, we introduce the new unfolding program and present its 
performance by applying it to two exemplary data sets from astroparticle 
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physics, taken with the MAGIC telescopes and the IceCube neutrino detec- 
tor, respectively. 

Keywords: unfolding, astroparticle physics, deconvolution, MAGIC, 
IceCube 

1 Introduction 

2 Solving inverse problems can be described as a method to find the cause 

3 of known consequences. Problems of this kind manifest themselves in a wide 

4 range of research fields such as natural sciences, economics and engineering. 

5 Looking at physics as an exemplary field, inverse problems are among the 

6 fundamental challenges in various areas, for instance particle physics, crys- 

7 tallography or medicine. The particular problems and solutions in this paper 

8 will be presented and described alongside the subject of astroparticle physics. 

9 The nomenclature used here is mainly following Ij. 

10 The structure of this paper comprises three main sections. First, the class 

11 of inverse problems and the general procedure of unfolding with regulariza- 

12 tion are outlined. In a second section, the new unfolding program TRUEE is 

13 introduced. Subsequently, the first applications of the program in astropar- 

14 tide physics, namely in the data analysis of the experiments MAGIC and 

15 IceCube, are presented in the third section. We conclude with a summary of 

16 the obtained results and an outlook on further extensions and applications 

17 of the program. 
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1. Inverse problems and unfolding 

In general, the distribution f{x) of a variable x has to be determined. 
However, it is often not possible to measure the value x directly. Instead, 
the detector records x-correlated variables y. These signals can be seen as 
the mentioned consequences of the causation x. The goal is to get the best- 
possible estimate of the /(x)-distribution from the measured (7(y)-distribution. 
As the measurement in a real experiment is distorted, this is not trivial. A 
direct allocation of a value x to a value y is not possible, because one x value 
causes different signals with different y values with certain probabilities. Fur- 
thermore, the probability to record a signal at all is usually less than one and 
depending on x, which causes a loss of events. Thus, the transformation of 
x to 1/ is disturbed by a finite resolution and a limited acceptance of a real 
detector. 

In mathematics, this problem can be described by the Fredholm integral 
equation 2! 



where g{y) is the distribution of the measured observable y and can in general 
be multidimensional. The function A{y, x) is called the kernel or response 
function and includes all effects which occur in a real measurement process. 
In most cases, this function is not known exactly and has to be determined 
by Monte Carlo (MC) simulations, where the measured and the real distri- 
butions are known. The parameters c and d are the integration limits of the 
range where x is defined {c < x < d). The function b{y) is the distribution 
of a possible background, which is assumed to be known. 




(1) 



41 In reality the measurement delivers discrete values. Furthermore the han- 

42 dling by the algorithm requires a numerical description of the distributions. 

43 Thus, a discretization of all functions is required. The distribution f{x) can 

44 be parametrized with the Basis-spline (B-spline) functions Pj{x) [3] and the 
corresponding coefficients aj 
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f{x) = J2ajPjix)- (2) 

46 The B-spline functions consist of several polynomials of a low degree. In 

47 the following cubic B-splines are used. They consist of four polynomials of 
4B third degree each. The points where adjacent polynomials overlap are called 

49 knots. At the knot positions a B-spline is continuously different iable up to the 

50 second derivative, which is important because the second derivative is used 

51 for the implemented regularization (see Eq. [H]). For equidistant knots, the 

52 cubic B-splines are bell-shaped. Because of the low degree of the polynomials, 

53 an interpolation with B-spline functions does not tend to oscillate. By using 

54 this parametrization, the B-spline functions can be included in the response 

55 function during the discretization: 



/ A{y^x)f{x)dx = ^ (ij / A{y, x)pj{x)dx 

Jc [Jc 

m 

= EsA(?/)- (3) 

56 By integrating over the ^/-intervals, the kernel function becomes a response 

57 matrix: 

A, = r A,{y)dy. (4) 

•'Vi-i 



58 The same integration can be carried out for the measured distribution g[y) 

59 and the background distribution b{y): 

9i= 9{y)dy, (5) 
k = r b{y)dy. (6) 

60 Consequentially, the Fredholm integral equation becomes the matrix equation 

g = Aa + b, (7) 

61 with g, a and b as vectors and A as the response matrix. To determine the 

62 sought distribution /(x), the coefficients aj need to be found. 

63 Solving Eq. [7] is called unfolding and is generally not trivial. Due to the 

64 finite resolution a smoothing effect on the measured distribution g is intro- 

65 duced. After the rearrangement of the matrix equation this smoothing effect 

66 is inverted and results in implausible oscillations of the sought distribution 

67 f{x). The most straightforward approach for the solution is the inversion 
6B of the response matrix A, if A is quadratic and non-singular. The resulting 

69 inverse matrix A"-*- contains negative non-diagonal elements and very large 

70 diagonal elements. This causes the mentioned oscillation, which appear in 

71 any approach of solving Eq. [7] if no additional corrections are applied. This 

72 is known as a so-called ill-posed problem and generally occurs in all measure- 

73 ment processes. 

74 To suppress the oscillations in the unfolded distribution, so-called regu- 

75 larization methods are applied. In the presented realization the Tikhonov 

76 regularization ^ is implemented. The method, in its generalized form, re- 

77 quires the linear combination of the unfolding term with a regularization term 
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(sometimes called penalty term), which contains a regularization factor. The 
regularization term contains an operator, which implies some a-priori as- 
sumptions about the solution, such as smoothness. In the current case the 
smoothness of the solution is controlled by the curvature operator C. A large 
curvature corresponds to large oscillations. Thus, reduction of curvature im- 
plies reduction of oscillations and that smoothes the resulting distribution. 
Since the parametrization of f{x) is based on cubic B-spline functions, the 
curvature r(a) takes the simple form of a matrix equation 



with C as a known, symmetric, positive-semidefinite curvature matrix. 

The actual unfolding is performed as follows. At first the response matrix 
A is calculated, based on the MC sample. To determine the coefficients a of 
the final result, the unfolding equation (Eq. [7]) is set up, where g is the real 
measured observable distribution. To fit the right hand side to the left hand 
side of this equation, a maximum likelihood fit is performed. For simplicity, 
a negative log-likelihood function 



is formed and minimized. Here gi^m is the number of measured events in 
an interval i including the possible background contribution in this region. 
This number follows the Poisson distribution with mean value gi. A Taylor 
expansion of the negative log-likelihood function can be written as 






(9) 



^(a) 



S{a) + (a - a)^h 



+ 




(10) 
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97 with gradient h, Hessian matrix H and a as a first estimation of coefficients, 

9B which have to be found. 

99 After considering regularization (Eq. |8]), the final fit function 

R{a) = 5(a) + (a-a)^h+ i(a-a)^H(a-a) 

+ ^ra^Ca (11) 

100 has to be minimized to obtain the unfolded result. The regularization pa- 

101 rameter r controls the effect of the regularization. The challenge is to find 

102 a proper value for r, to get an optimal estimation of the result as a balance 

103 between oscillations and the smoothing effect of the regularization. 

104 One method to define a value for r is to set up the relation between r 

105 and the effective number of degrees of freedom ndf 

m 1 

106 Here Sjj are the eigenvalues of the diagonalized curvature matrix C, ar- 

107 ranged in increasing order. The summands in Eq. [T2] can be considered as 

108 filter factors for the coefficients. These coefficients represent the transformed 

109 measurement and are arranged in decreasing order. The filter factors with 
no values < 1 diminish the influence of insignificant coefficients. Accordingly 

111 an increasing value of r cuts away smoothly the high order coefficients and 

112 reduces the number of degrees of freedom. In turn, the definition of number 

113 of degrees of freedom allows the specification of the number of filter factors 

114 and thus of the regularization strength. 

115 To obtain Eq. [T^l the Hesse and curvature matrices in Eq. [TT] have to be 

116 diagonalized simultaneously. To do this, a common transformation matrix 



117 has to be found, which transforms the Hesse matrix into a unit matrix and 

118 diagonahzes the curvature matrix 5|. 

119 A lower hmit of the parameter r can be estimated by testing the statistical 

120 relevance of the eigenvalues of the response matrix. Applying Eq. [12], the 

121 number of degrees of freedom has to be chosen such that r is above the 

122 suggested limit, in order to avoid the suppression of significant components 

123 in the solution. 

124 2. TRUEE 

125 Several algorithms have been developed for solving inverse problems in 

126 different categories. One of them is 7UJM - Regularized UNfolding 

127 which uses the mathematics outlined in Sec. [TJ TZUM was developed in 



12B the 1980's in 
129 around 1995 



ORTRAN77 and was updated several times, the last time 
61]. The 7UAN algorithm has been converted to a ROOT based 

130 C++ version. Furthermore it has been equipped with additional functions 

131 and user-friendly extensions. This new software package is called TRUEE 

132 - Time-dependent Regularized Unfolding for Economics and Engineering 

133 problems. 

134 The algorithm can process event- wise data input, by reading single n- 

135 tuples of variables. This flexibility permits an individual determination of 

136 the response matrix for every specific case, in contrast to algorithms which 

137 can only deal with histograms as input. Additionally, supplementary cuts 

138 or event weights can be applied internally without changing the input data 

139 files. Furthermore, the availability of individual event information allowed 

140 the development of a method to verify the unfolding result (see Sec. 12. 4p . A 
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141 set of up to three observables can be used to perform the unfolding fit. Thus, 

142 the precision of the estimated function can be enhanced by choosing three 

143 observables with complementary information content. 

144 7UAM and TRUEE deliver the same results in terms of both data points 

145 and uncertainty estimation. In Fig. [T] the comparison of the two algorithms 

146 is demonstrated by performing an unfolding of a simulated distribution. The 

147 true distribution and the observable are shown as well, illustrating the finite 

148 resolution and limited acceptance of the simulated measurement. In the 

149 bottom panel, the ratios of the unfolded bin contents show an almost perfect 

150 matching between the algorithms. Minor deviations can be accounted to 

151 the distinct handling of floating point variables of the two different compiler 

152 types. 

153 Instead of unfolding a given data sample as a whole, TRUEE can also 

154 be used to investigate changes with time in the investigated distribution. If 

155 structural interruptions with respect to time are found beforehand, TRUEE 

156 can unfold time slices of the inspected data and reveal time-correlated changes 

157 in the corresponding distribution. 

158 The installation of TRUEE is straight forward on UNIX based operating 

159 systems, as it uses CMake [7]. The new algorithm is able to deal with two 

160 different types of input files: ASCII and ROOT files. To make the anal- 

161 ysis procedure more comfortable, new functions have been included, which 

162 are described in the following (Sec. 12.11 to 1?^ . Besides the newly imple- 

163 mented functionalities, a well-proven TZUM function for the verification of 

164 the unfolding result shall also be mentioned. The functionalities of a built-in 

165 correction for the acceptance of the experimental setup and the treatment of 
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Figure 1: Comparison of results of the original unfolding algorithm TZUAf (gray circles) 
and the new C++ version TRUEE (black squares). The solid line shows the true sought 
distribution. The shaded area represents the distribution of the measured observable, 
which has been used for the unfolding. The relative deviations of the bin contents and un- 
certainties from both algorithms can be seen in the lower figure and show a good agreement 
between the unfolding results. 

166 background that is present in the measurement are hkewise inherited from 

167 TUAM and will be described below as well. 

168 2.1. Selection of observables 

169 Generally a measured event is characterized by a large set of observables. 

170 TRUEE can deal with more than 30 different observables, of which up to 

171 three can be used for the unfolding at the same time. These should be the 

172 observables which are most correlated with the variable to be unfolded. To 

173 check the dependency of the observables on this variable, correlation and 
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174 profile histograms are automatically created from the MC sample. Different 

175 examples of such histograms are shown in section |3l 

176 2.2. Parameter selection 

177 Generally, an unfolding algorithm requires the input of various parameters 

178 by the user, such as the binning of observables and final histogram as well as 

179 the influence of the regularization. The challenge of selecting an optimal pa- 

180 rameter set has its difficulties in finding a result with low correlation between 

181 the unfolded data points and low bias, which is introduced by regularization. 

182 This outcome has to be identified out of many results with different parameter 

183 combinations. The three crucial parameters are 

184 • number of bins 

185 • number of knots 

186 • number of degrees of freedom. 

187 The number of knots defines the number of B-splines used in the superpo- 

188 sition for the unfolded function (see Eq. |2]). This number is related to the 

189 internal binning of the sought distribution j[x) for the unfolding, which is 

190 chosen to be equidistant. After the estimation, the obtained j{x) is trans- 

191 formed to a binned distribution that represents the final result, for which the 

192 number of bins can be chosen. The individual bins can have different widths. 

193 The number of degrees of freedom controls the influence of the regular- 

194 ization by defining the parameter r (see Eq. [T^ . A low number of degrees of 

195 freedom means strong regularization and a positive correlation between the 

196 unfolded data points. Thus, the introduced bias may be too high. A large 
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197 number reduces the regularization and causes implausibly large fluctuations 

198 and uncertainties. Oftentimes, these can even be larger than the bin contents 

199 due to negative correlations between data points. A balance between these 

200 extreme cases has to be found. In general the number of degrees of freedom 

201 should be roughly the same as the number of bins, as this ensures that on the 

202 one hand no information which is contained in the measurement is discarded 

203 and on the other hand no positive correlations are introduced by solving an 

204 underdetermined system. 

205 To facilitate the task of choosing a good combination in the three pa- 

206 rameters outlined above, histograms are provided by TRUEE, which show a 

207 quality value k that indicates whether the correlations among the unfolded 

208 data points can be neglected. For each number of bins, one such histogram is 

209 provided, where the number of knots and the number of degrees of freedom 

210 are the two dimensions of the histogram. In these histograms, the parameter 

211 region, where the least correlation between the data points can be seen, can 

212 be identifled. An example of such a chart is shown in Fig. [2j 

213 The displayed correlation-related value k is the resulting quantity of a test 

214 to determine whether the covariance matrix can be considered as diagonal, 

215 which has been developed within the original algorithm of 7UJM . Within 

216 the test, 5 000 multivariate gaussian deviations of the unfolded result are 

217 randomly generated using the full covariance matrix. Each of the multivariate 

218 deviations is compared to the unfolded result by a calculation, in which 

219 the covariance matrix is assumed to be diagonal. The p-values obtained from 

220 the values are flUed into a histogram. In the case of a flat distribution 

221 of p-values the covariance matrix can be considered as diagonal and the 
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222 correlations between the unfolded data points are negligible. The value k, 

223 describes the flatness of the p-value histogram. It is built as the sum of 

224 the absolute residuals between the determined and a fiat p-value distribution 

225 divided by the number of bins of the p-value histogram. 




4 6 8 10 12 14 

number of degrees of freedom 



Figure 2: The correlation-related value k, as a quality factor for the unfolding result, color- 
coded in the two-dimensional histogram of varying number of knots and number of degrees 
of freedom. In this example, the best results with the lowest correlations are located in 
the range between 9 and 11 degrees of freedom. The strong dependency of correlation on 
the regularization, expressed by the number of degrees of freedom, is clearly visible. 

226 2.3. Test mode 

227 To find an optimal set of unfolding parameters and check whether the 

228 unfolding is working well with the selected observables, a test mode has 

229 been implemented as an additional tool. While running in test mode, the 

230 simulated event sample, which is given to the program, is considered alone. In 

231 unfolding mode, this sample is only used to determine the detector response 

232 matrix. In test mode, a fraction of events from the simulated sample can be 

233 selected to serve as a pseudo data sample, which is subsequently unfolded. 
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234 The rest of the simulated events is used to determine the response matrix 

235 in the usual way. Since the true distribution of the variable which is to be 

236 unfolded is known in a simulation, it is possible to test whether the unfolded 

237 distribution matches the true one. The comparison between the unfolded and 

238 the true distributions is performed with a Kolmogorov-Smirnov test 8| and 

239 a test. Histograms showing the agreement of the distributions for each 

240 combination of number of knots and degrees of freedom are provided. In test 

241 mode an additional parameter selection method can be used by plotting k 

242 versus the values of the test for each parameter set. The parameter sets 

243 with minimal correlation among the data points and a good fit can be found 

244 where both the k and the value are small. An example is given in Fig. [31 

245 The parameter setting, which shows the best agreement in the test unfolding, 
should be used for the actual unfolding of the real data. 
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unfolded result 

Figure 3: The correlation-related value k versus the value from the comparison of the 
unfolded result with the true distribution. The bin contents are marked with the number 
of degrees of freedom. The optimal result can be found in the region of small k and small 
values, here 9 degrees of freedom. The additional variation of numbers of knots and 
numbers of bins is not shown in this figure. 

246 
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247 2.4- Verification 

248 Generally, the distributions in detector observables of the MC do not 

249 necessarily match the ones in measured data. After the unfolding result has 

250 been obtained, the consistency of the unfolding and the MC simulation can be 

251 verified by comparing the distributions of individual observables of real data 

252 with a weighted MC sample. To do this, the MC sample, which has been used 

253 to calculate the response matrix, is weighted with respect to the distribution 

254 in the variable x that is seen in the unfolding result. Hence, following the 

255 same distribution in x, the resulting MC sample should describe the data 

256 sample perfectly well and all observable distributions should match between 

257 MC and data. This is especially interesting for observables that have not 

258 been considered during the unfolding fit. Histograms showing distributions 

259 of real data and the weighted MC sample are provided for each observable 

260 that has been introduced to the program. Examples are shown in Sec. 13.1.81 

261 and 13.2.51 

262 2.5. Acceptance Correction 

263 For the reconstruction of the initial distribution f{x), which describes the 

264 sought physically meaningful quantity, it is necessary to consider the limited 

265 acceptance and loss of events due to a quality selection during the analysis. 

266 A corresponding correction can be done by TRUEE, if the function of the 

267 generated MC event distribution is supplied by the user. The acceptance of 

268 the measurement, which can be a function of the variable to be unfolded, is 

269 defined as the ratio between the generated MC event distribution and the 

270 MC event distribution at the final analysis level. TRUEE determines this 
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271 acceptance for each bin of the demanded variable and apphes it during the 

272 unfolding of the distribution. 

273 2. 6. Consideration of background 

274 If background is present in the measurement process, it has to be taken 

275 into account during the unfolding. With a given background event sample, 

276 TRUEE performs a corresponding correction. By adding the detector observ- 

277 able distributions of the background sample to the expectation (see Eq. [7]), 

278 it is considered during the unfolding fit. 

279 3. Application of TRUEE in astroparticle physics experiments 

280 Many ground based astroparticle physics detectors suffer from the fact 

281 that it is not possible to directly measure the primary particles and their 

282 properties. Indirect detection methods are necessary, which instead utilize 

283 atmospheric particle showers or measurements of secondary particles. The 

284 correlation between the distributions in the thus derived observables and 

285 the distribution in the sought quantities is usually complex and ambigu- 

286 ous. Moreover, detection processes are affected by limited acceptance. For 

287 example, the original particle's energy and direction are folded with the in- 

288 teraction cross sections and response of the detector. Thus, the application 

289 of unfolding methods is necessary to determine the distribution of the vari- 

290 able to be found. In this section we present the application of TRUEE in the 

n 

291 astroparticle experiments MAGIC ^] and IceCube [10|. 
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292 3.1. The MAGIC telescopes 

293 3.1.1. The experiment 

294 The Major Atmospheric Gamma-ray Imaging Cherenkov telescopes are 

295 a stereoscopic system of two Cherenkov telescopes, which is situated on the 

296 Roque de los Muchachos on the Canary island of La Palma. MAGIC started 

297 its operation as a single telescope experiment in 2004 and has been upgraded 

298 to a stereoscopic system later on, which is operational since late 2009. 

299 The experiment accesses the energy range of 50 GeV to several tens of 

300 TeV in cosmic gamma-rays in the standard operation mode. Measurements 

301 of the gamma-ray flux at these energies give insight into a large set of highly 

302 energetic astronomical sources, such as Supernova Remnants, Active Galactic 

303 Nuclei and potentially Gamma-Ray Bursts. Besides source studies, gamma- 

304 rays also allow the investigation of the extragalactic background light and 

305 more exotic phenomena like the search for dark matter particles. 

306 Ground-based gamma-ray detectors like MAGIC exploit the Earth's at- 

307 mosphere as their detection volume. High energy gamma-rays reaching Earth 

308 cause atmospheric particle showers. These are accompanied by Cherenkov 

309 radiation, which can be detected by the telescope cameras. The number of 

310 Cherenkov photons, along with the reconstruction of the shower geometry, 

311 can deliver an energy estimation of each incident gamma-ray. 

312 Unfortunately, these events are outnumbered by a huge background of 

313 hadronically induced particle showers, which have to be separated statisti- 

314 cally from the sought gamma particle showers in the course of the so-called 

315 gamma/hadron separation 12|. 13|. 

316 Additional background from diffuse electrons or gamma-rays also influ- 
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317 ences the measurement. To determine the size of the remaining background, 

318 off-source measurements are taken. A convenient way to do this is the so- 

319 called Wobble observation mode, which permits the simultaneous observa- 

320 tion of the signal region and a background region [l^. The parameter 6'^, 

321 which describes the distance in the telescope camera between the expected 

322 source position and the reconstructed source position for each event, defines 

323 an "on" region and an "off" region in the camera. This way, the recorded 

324 background events are taken within the same time as the signal events. It 

325 is possible to define more than one "off" position, to increase the precision 

326 of the background measurement. However, to achieve more clarity, only one 

327 "off" position is used in the application presented here. In this case, the 

328 determination of the excess between "on" and "off" events can be evaluated 

329 without further normalization. 

330 3.1.2. Spectrum reconstruction procedure 



331 During the analysis of MAGIC data with the analysis package MARS 15| , 



332 the recorded shower images are calibrated, cleaned and characterized by so- 



333 called image parameters 16|. Among these are the width and the length, 

334 describing the root mean square spread of light along and perpendicular to 

335 the main axis of the image, the light content of the shower {size) and the frac- 

336 tion of light contained in the brightest pixel compared to the total amount of 

337 light in the image {concentration). Some of these parameters are combined 

338 to the estimated energy, using the statistical learning method of Random 

339 Forest training Q- This parameter has per construction a very good corre- 

340 lation with the true energy. Similarly, for the best possible gamma/hadron 

341 separation, a parameter hadronness is built, which describes the probability 
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342 for each event to be of hadronic origin. 

343 To obtain a differential spectrum with respect to the true gamma-ray 

344 energy, an unfolding procedure is applied, using one or several of the ob- 

345 servables at hand. The training of the Random Forests and the unfolding 

346 procedure require Monte Carlo simulated events, which must have undergone 

347 the same analysis procedure up to this point. The simulations used for the 

348 analysis of MAGIC data are produced with the air shower simulation package 



349 CORSIKA 



18| . followed by the detector simulation 19 1. 



350 At present, the standard MAGIC analysis offers the possibility to per- 

351 form the reconstruction and unfolding procedure in two subsequent steps. 

352 First, data events which contain all formerly mentioned parameters are read 

353 and cuts are applied on hadronness, to select gamma-like events, on the sky 

354 coordinates of the recorded events and on 6"^, to separate events from the on- 

355 and the off-source measurements. Monte Carlo simulated events are used to 

356 determine the acceptance of the detector, the effective area, and the migra- 

357 tion matrix for the true energy and the observable estimated energy. The 

358 product is an energy spectrum of the estimated energy and the migration 

359 matrix for the chosen binning. 

360 In a second step, different unfolding algorithms with different regulariza- 

361 tion methods can be applied, in order to produce a spectrum with respect 

362 to the true energy 



20| . Among the offered regularizations, the methods by 

363 Bertero jll], Schmelling 22 1 and Tikhonov [4J] can be used. The unfolding is 

364 performed using the formerly generated migration matrix. Furthermore, fits 

365 of several functions to the obtained spectrum can be performed, taking into 

366 account the correlation between the unfolded data points. 
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367 There are several limitations within this procedure. The basis for the 

368 unfolding is fixed to the already binned histogram provided in the first step 

369 of spectral reconstruction. Thus, an optimization of the binning is not pos- 

370 sible during the unfolding process. Furthermore the estimated energy is the 

371 only available observable. Additional observable parameters are not accessi- 

372 ble anymore, but might yield complementary information. The application 

373 of TRUEE offers an alternative approach to the whole reconstruction and 

374 unfolding process, which can avoid these unnecessary limitations. 

375 3.1.3. Application of TRUEE 

376 As described in section |2l TRUEE offers the usage of up to three observ- 

377 able parameters for the unfolding. It reads the data sample on event-by-event 

378 basis instead of ready-made histograms, which leaves the program the free- 

379 dom to choose an optimal binning for all the observables. The unfolding 

380 program also performs the acceptance correction of the detector. Moreover, 

381 within the unfolding process, TRUEE can account for the background of 

382 the measurement, using a background event sample. In the standard MARS 

383 analysis this is done prior to the unfolding. 

384 The fact that individual events are read by the program requires TRUEE 

385 to enter the analysis process at an earlier step than the current unfolding 

386 tool. This is feasible, as the consideration of the background events does not 

387 need to be carried out before the application of TRUEE. Also, the building 

388 of preliminary histograms is done inside the program during the unfolding 

389 process. Thus, the first step in spectral reconstruction, which has been out- 

390 lined in I3.1.2[ is not needed for a TRUEE-based spectrum reconstruction. 

391 Still, the applied cuts as well as the determination of signal and background 
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392 events are required to be carried out prior to the unfolding. Thus, an inter- 

393 face has been implemented to permit TRUEE to enter the analysis workflow 

394 of the experiment. The tasks which are conducted by this new interface are 

395 outlined in the following. 

396 The program reads the events from data and MC simulation files, applies a 

397 cut to exclude hadron-like showers and cuts to choose events which belong to 

398 either the signal or the background region of the telescope camera. It creates 

399 one output file for signal events, one for background events and a third one 

400 for MC events. These files contain all parameters which are relevant for 

401 the unfolding, disengaged from the MAGIC data file tree structure, as even- 

402 level branches. Furthermore, the program reads basic information about the 

403 produced MC events and stores them into an extra tree in the MC file. Prom 

404 the data sample, the effective observation time is extracted and added to 

405 the signal output file in an additional tree. This additional information are 

406 needed for the acceptance correction which is done within TRUEE as well. 

407 After the unfolding process, a script file is used to extract the solution 

408 with the best combination of parameters and to apply a fitting algorithm 

409 which takes into account the correlation between the bins of the unfolded 

410 spectrum. Quoting such a fit result, in addition to the unfolded data points, 

411 is common for the presentation of energy spectra in astroparticle physics, 

412 as it facilitates the comparison of results from different analyses. The cor- 

413 responding fit function can be selected by the user among several choices 

414 (power law, broken power law, etc.). 
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Figure 4: Correlation between the energy and the observational parameters used for the 
application of TRUEE in the MAGIC analysis. Shown are scatter plots of events (left) and 
the related profile histograms (right). An optimal correlation is present in a monotonically 
changing profile function with small uncertainties. The density is displayed in color code. 
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415 3.1.4- Choice of observables 

416 For the unfolding of MAGIC data the space of observable parameters has 

417 been investigated. The set of parameters which has proven to deliver good 

418 results are: 

419 • The estimated energy, a combination of observables which is gained 

420 by random forest regression and correlates very well with the true en- 

421 ergy, has been the scaffolding of the current MAGIC unfolding and is 

422 a fruitful contribution also for the unfolding with TRUEE. 

423 • The parameter concentration, which describes the light content ratio 

424 of the brightest pixel compared to the surrounding ones, shows a clear 
426 correlation with the true energy. 

426 • As a third parameter the zenith angle is an important input for an un- 

427 folding with TRUEE. Even though it does not show a good correlation 

428 with the energy, it influences the image of each event in the camera, so 

429 that events with the same energy look different if they have been taken 

430 at different zenith angles. 



431 The correlation of each observable parameter with the true energy is shown 

432 in Fig. m 

433 3.1.5. Acceptance correction 

434 The data events, which are read by TRUEE, are only those events which 

435 triggered the telescopes and which survived the analysis cuts. Similarly, the 

436 MC set only comprises events which remain after the selection by a simu- 

437 lated trigger and the same analysis cuts. In other words, the measurement 
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438 is affected by a loss of events compared to the initially arriving particles. 

439 The ratio between the distributions of the surviving and the arriving parti- 

440 cles is given by the acceptance of the measurement process. As explained in 

441 12. 5[ TRUEE can deduce this acceptance and apply an appropriate correction 

442 during the unfolding, in order to get the true distribution in the sought quan- 

443 tity. For MAGIC data, this generally is the differential flux of gamma-ray 

444 particles, i.e. the number of particles per unit area, time and energy. Thus, 

445 the distribution of the MC has to be expressed in the same way. The area in 

446 which the MC events are generated is given by a circle whose radius is the 

447 so-called maximum impact parameter r. As MC events do not have a density 

448 in time, this factor has to be determined by the following consideration. To 

449 obtain the actual factor between the number of data events collected within 

450 the actual observation time and the initial flux from the simulations, MC 

451 and data events have to be related to each other. For this reason the MC 

452 distribution is normalized to the effective observation time T^bs which the 

453 real data sample was collected in. 

454 Adding this information, the normalization constant of the MC distribu- 

455 tion can be obtained from the number of generated particles Ng^n, the energy 

456 range which has been simulated {Emm to E^ax) and the spectral index 7, 

457 using 



dE dA dt 



C- 




(13) 



458 Integrating yields the number of generated events. 




(14) 
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460 In the case of unfolding a MC sample, the flux of particles per time is not 

461 a meaningful parameter, as neither the auxiliary MC sample nor the target 

462 sample have such a time density information. So in that distribution 

463 of particles per area and energy is used. 

464 3.1.6. Unfolding of MC spectra 

465 The unfolding of MC data is presented in several steps. First, a test 

466 unfolding is performed. In a second step, a different MC sample is used 

467 as pseudo data. These unfolding procedures handle only the distribution of 

468 events which remain after the trigger simulation and cuts set during the anal- 

469 ysis. Subsequently, two examples of an unfolding with an applied correction 

470 for the acceptance of the detector is shown, which results in a differential flux 

471 spectrum, i.e. in the case of MC the initial number of particles per energy 

472 and area which have been generated in the MC simulation. The used MC 

473 samples are summarized in Tab. [U The features specified there are the spec- 

474 tral index of the generated power law distribution, the range in zenith angle 

475 which is covered by the simulated events and the maximum impact parameter 

476 r, which specifies the area over which the generated events are distributed. 

477 Furthermore the number of primarily generated events is given as well as the 

478 number of events which survive after triggers and analysis cuts. 

479 Following the above scheme, first of all a test unfolding is performed, using 

480 only MC sample A. 10 % of the MC is taken as pseudo data to be unfolded. 
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MC Sample 


A 


B 


C 


Spectral index 7 


1.6 


2.6 


2.6 


Zenith angle range 


5° - 35° 


5° - 35° 


5° - 35° 


Impact parameter 


350 m 


350 m 


350 m 


No. generated 


7893 000 


10 000 000 


40 000 000 


No. residual 


272 283 


24444 


100 224 



Table 1; Summary of the Monte Carlo samples which are used during the unfolding of 
MAGIC Monte Carlo and data events. 

481 and 90 % are used for the response matrix and acceptance calculation. A 

482 wide range of unfolding parameters is probed and the combination which de- 

483 livers the smallest inter-bin-correlation is chosen. For the analysis presented 

484 here, which results in a spectrum with 16 bins, these are 21 knots and 13 

485 degrees of freedom. The resulting MC spectrum can be seen in Fig. [5l where 

486 additionally the true distribution of events is shown. This direct comparison 

487 of the unfolded and the true distribution permits to verify the goodness of 

488 the choice of the used unfolding parameters. 

489 A successful test unfolding delivers a good reproduction of the input spec- 

490 trum and information about which parameter combinations give reliable re- 

491 suits. However, in this case the MC and the pseudo data events follow the 

492 same distribution as they stem from the same MC sample. As the simulated 

493 and the real distribution of the data are in general not equal, the performance 

494 for different distributions in MC and pseudo data needs to be investigated. 

495 For this purpose a second MC unfolding is carried out, this time applying 

496 MC sample B as pseudo data, while the whole sample A is used to calculate 

497 the response matrix. The two MC sets show different distributions in energy, 
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Figure 5: Event distribution obtained with an unfolding of MC events in TRUEE's test 
mode. The unfolding result (red points with error bars) is compared to the true distribu- 
tion known from the MC (blue/dashed curve). 

498 such that the spectral indices differ by 1.0. The number of generated events 

499 in sample B is higher, but due to the steeper spectrum and the decrease of 

500 the trigger efficiency towards low energies, the final sample is one order of 

501 magnitude smaller than MC sample A. The ratio of events between data and 

502 MC events of ~ 10 is also desirable for the unfolding of real data. 

503 The unfolding process is carried out with the same binning of observable 

504 parameters and combinations of unfolding parameters as the test unfolding 

505 shown above. The result can be seen in Fig. El 

506 After the unfolding of triggered event distributions, the unfolding of a MC 

507 sample with applied acceptance correction will be shown. For this purpose, 

508 MC sample B serves as pseudo data again, while sample A forms the MC 
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Figure 6: Unfolded event spectrum of MC events. MC sample B with spectral index 
7b = 2.6 has been unfolded as pseudo data, applying MC sample A with spectral index 
7yi = 1.6. The unfolded points of the event spectrum (red points with error bars) and the 
original distribution (blue/dashed curve) are shown. 

509 sample for the determination of the response matrix. Additionally, the initial 

510 distribution of sample A is given as 

(-7 + 1) 



dEdA r2 ■ vr 



/F \-7+l /p \-7+l' ^ ' 

\lGeV J \lGeV J 

511 with Emin = 10 GeV and E^ax = 30 000 GeV. For the remaining quantities 

512 see Tab. [H 

513 The unfolding itself is performed with the same binning of the observables 

514 and with the same unfolding parameters. Figure [7] shows the unfolded dis- 
615 tribution and the initial MC function. It has to be noted that, while a good 
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516 agreement is achieved at intermediate energies, the distribution at lower en- 

517 ergies appears to be systematically underestimated. This effect is caused by 

518 the fact that the acceptance correction refers to the center of gravity within 

519 each bin of the initial MC distribution. For large differences between the 

520 spectrum of the MC sample used to determine the response matrix and of 

521 the spectrum obtained by the unfolding procedure, the relative shift between 

522 the centers of gravity of the two distributions is not negligible anymore. 

523 However, this effect can be corrected, if - in the case of such discrepancies in 

524 the spectra - a second step of unfolding with acceptance correction is applied, 

525 using a re-weighted MC spectrum which is more similar to the result of the 

526 first step. 

527 For the study presented here, the unfolding of the pseudo data sample 

528 with acceptance correction is repeated using MC sample C (see Tab. [1]), 

529 which shows the same spectral index as the pseudo data. The result can be 

530 seen in Fig. [HI Obviously, the formerly seen discrepancies at low energies do 

531 not appear in this case. 

532 Still, the unfolding itself is only slightly affected by this dependency. It 

533 delivers good results for the event spectra, also for this significant difference in 

534 the spectral indices, as can be seen in the Fig. Ei The remaining discrepancy 

535 at very low energies disappears after the above mentioned correction. 

536 3.1.7. Unfolding of a MAGIC data sample 

537 After the successful application of TRUEE in the MAGIC unfolding of 

538 MC spectra, a proof of principle on real telescope data is given in the fol- 

539 lowing. For this purpose the standard candle of gamma-ray astronomy, the 

540 Crab Nebula, serves as an exemplary source. A test data sample which is 
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Figure 7: Unfolding of MC simulations as pseudo data with built-in acceptance correction 
of TRUEE. MC sample B is unfolded as pseudo data, while MC sample A serves as MC 
in the unfolding. The red points with error bars represent the result of the unfolding. The 
blue solid line shows the true initial MC distribution. 

541 described below is analyzed with both the standard MAGIC analysis chain 

542 and the new chain including TRUEE. Finally a comparison of the results 

543 is shown. We would like to state at this point that the presented analysis 

544 is not optimized for extracting any results regarding the physics of the ob- 

545 served source or the telescope performance. It only serves as an example 

546 of the compatibility of the two analyses. Studies on the performance of the 

547 MAGIC stereo system can be found in jo]. 

548 The data sample comprises 7.3 hours of Crab Nebula observations taken 

549 with the MAGIC telescopes. The data have been taken in Wobble observa- 

550 tion mode. 
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Figure 8: Unfolding of MC simulations as pseudo data with built-in acceptance correction 
of TRUEE. Unlike in Fig. [71 the pseudo data sample MC B is unfolded using MC sample C, 
which features the same spectral slope. The result of the unfolding is given by red points 
with error bars. The blue solid line represents the true initial MC distribution. 

551 The preparation of the data, including the conversion of the extracted 

552 charge into the number of photons at the photodetector, the cleaning of 

553 the shower images and the determination of image parameters to the light 

554 distributions are identical for both analyses. The standard MAGIC analysis 

555 and the TRUEE-based analysis diverge at the point where both the data and 

556 the MC are preprocessed such that the events are all characterized by image 

557 parameters and are assigned an estimated energy and a hadronness. 

558 In the current MAGIC analysis, a standard analysis following 9|] has been 

559 used to derive cuts for the separation of gammas and hadrons and for the 

560 determination of the "on" and the "off" event sample, using the standard 
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561 spectrum reconstruction tool. The obtained cuts are applied on the MC and 

662 on the data sample. MC sample A is used to determine the effective area of 

563 the measurement and to build the migration matrix. The resulting spectrum 

564 of differential flux vs. estimated energy is unfolded using the current MAGIC 

565 tool and the generated migration matrix. Several executions with different 

566 regularization methods have been performed, leading to compatible results. 

567 Following the analysis chain presented in this paper, the event files are 

568 processed by the aforementioned interface. The applied analysis cuts are the 

569 same as the ones used in the standard MAGIC analysis example. TRUEE 

570 is performed using also MC sample A to calculate the migration matrix and 

571 to obtain the overall acceptance. Events which are sorted into the off-source 

572 measurement sample are given to TRUEE as background events. For the case 

573 of one "off" region, no normalization in terms of weighting has to be applied 

574 to these events as in Wobble mode, the on- and off-source measurements 

575 are taken simultaneously. The chosen set of unfolding parameters, namely 

576 number of bins, number of knots and number of degrees of freedom, is the 

577 one which has proven to deliver good results during the MC based unfolding 

578 procedures discussed before. 

579 The comparison of the results, produced with the two analyses, are shown 

580 in Fig. ini The two spectra show a good agreement, with deviations below 

581 11%. 

582 3.1.8. Verification 

583 The shown result which has been obtained with TRUEE has been ver- 

584 ified in terms of the agreement of observable distributions in data and an 

585 accordingly weighted MC event sample. Figure [10] shows the comparison for 
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Figure 9: The upper panel presents the unfolded energy spectrum of the Crab Nebula, 
produced by the example analyses presented here. Shown are results obtained with the 
current MAGIC unfolding (black/dashed points with error bars) and TRUEE (red/solid 
points with error bars). Also shown are fits of curved power laws to the unfolded data 
points. The fit to the standard MARS unfolding is shown in blue/dashed, the fit to 
TRUEE- unfolded points is depicted in orange/solid. In the bottom panel, the relative 
deviation of the two fitted functions with respect to the energy is shown. 

586 two observables which have been used during the unfolding, while Fig. [11] 

587 displays the distributions for observables which have been neglected during 

588 the unfolding. Even for observable parameters which have not been consid- 

589 ered during the unfolding process, the a posteriori distributions match very 

590 closely. This is a strong confirmation for the quality of the unfolding result. 
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Figure 10: Comparison of distributions in observable parameters which have been used 
during the unfolding fit. Shown are the estimated energy and concentration distributions 
for real data (black/dot-dashed) and the re- weighted MC (red/solid). 
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Figure 11: Comparison of observable distributions for real data (black/dot-dashed) and 
re- weighted MC (red/solid). Shown are observables which have not been considered during 
the unfolding fit, namely height of shower maximum and width. 
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591 3.2. The IceCuhe neutrino observatory 

592 3.2.1. Experiment 

593 IceCube is a cubic kilometer-scale neutrino detector located at the geo- 

594 graphic South Pole. The main goal of IceCube is the investigation of cosmic 

595 rays by the detection of neutrinos. Since neutrinos have a very small in- 

596 ter action cross section and thus pass through a large amount of matter, a 

597 large detection volume is required to obtain neutrino-induced signals with 

598 reasonable statistics. For this reason IceCube utilizes a volume of Ikm^ 

599 in the glacial ice at the depth between 1 450 and 2 450 m, forming a three- 

600 dimensional grid of 5 160 digital optical modules (DOM) which are equipped 

601 with photomultiphers. The IceCube DOMs are fixed on strings which are 

602 arranged in a triangular pattern in distances of 125 m to each other. The de- 

603 lector deployment has been executed during antarctic summer seasons, from 

604 2005 till 2011. Each year since then, data has been taken with an n-string 

605 configuration of the partially constructed detector. For the analysis shown 

606 here the IceCube 59 string configuration (IC 59) is used. 

607 Neutrinos only undergo weak interaction and thus cannot be detected di- 

608 rectly. They produce secondary particles, such as muons, electrons or tauons 

609 according to the neutrino flavor. These and other secondary charged parti- 

610 cles induce Cherenkov light in the ice if their energy is high enough. The 

611 Cherenkov light propagates through the ice and causes signals, so-called hits, 

612 in the DOMs along the track of the secondary particle within the detector 

613 volume. From the time difference and the amount of charge in each DOM, 

614 the track of the secondary lepton can be reconstructed and its observable 

615 values are saved as one event. Only muons have a track- like signature in 
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616 the detector and provide sufficient directional information. Therefore, we 

617 consider muon neutrinos in the following. 

618 Muons produced in the Earth's atmosphere represent the main component 

619 of the background. In contrast, neutrinos can pass through the Earth, due to 

620 their small cross section. Thus, the Earth can be used as a filter to reduce the 

621 muon background by considering events coming only from below the horizon. 

622 In this example analysis the interest is focussed on the determination of 

623 the flux of muon neutrinos coming from decays of charged pions and kaons 

624 that are in turn produced by interactions of cosmic rays with the Earth's 

625 atmosphere. Studying the spectrum of this atmospheric neutrino fiux at 

626 energies beyond ~ 3 ■ 10^^ eV can provide information about the production 

627 of charmed mesons in the atmosphere by showing an enhanced neutrino fiux 

628 at higher energies, compared to the neutrino flux caused by light meson 

629 decays. Furthermore, a flattening of the neutrino energy spectrum to higher 

630 energies can permit conclusions about the existence of extragalactic high 

631 energy neutrinos, as their predicted flux shows a harder spectral index than 

632 the atmospheric neutrino flux. This would reveal new insights concerning 

633 the different models of cosmic ray production in the cosmic accelerators, such 

634 as Active Galactic Nuclei and Gamma Ray Bursts. Therefore an accurate 

635 estimation of the energy spectrum is essential. 

636 In the following, the steps of the regularized unfolding of the neutrino 

637 energy spectrum with TRUEE are demonstrated, by using Monte Carlo sim- 

638 ulations and 10 % of the measured IC 59 data. This analysis serves as a proof 

639 of principle and is not supposed to point out any conclusions about neutrino 

640 physics. 
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Figure 12: Scatter plots (left) and related profile histograms (right) used to check the 
correlation between the energy and the observables. 
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641 3.2.2. Neutrino sample 

642 For the analysis of the entire neutrino flux a neutrino sample with high pu- 

643 rity is desirable. The background contamination caused by mis-reconstructed 

644 atmospheric muons is chosen not to exceed 5 % to keep the uncertainty of 
646 the estimated energy spectrum small compared to statistical uncertainties. 

646 In the following, we use a neutrino sample which was obtained in the course of 

647 the atmospheric neutrino analysis. To reduce the background and to obtain 

648 a neutrino sample with a sufficiently large number of events, series of straight 

649 cuts were applied to the data, including the zenith angle cut 9 = 88° — 180°. 

650 Although the rejection of muon tracks from above the horizon is made, there 

651 are still mis-reconstructed background events. Therefore the final event se- 

652 lection was performed u sing the multivariate method Random Forest in the 

653 framework RapidMiner 23|| . The final sample consists of ~ 3 000 events in 

654 the used 10 % in the full data sample collected in one year. The correspond- 

655 ing MC sample, which is needed for the event selection training and the 

656 further determination of the response matrix during the unfolding, is pro- 

657 duced using the simulation of all physical processes following the theoretical 

658 models for cross sections and propagations of particles and photons through 

659 different kinds of media. The MC neutrino sample consists of more than 

660 6 • 10^ events, which are weighted to describe the measured data observables 

661 as accurately as possible 2J] . Using event weights, the MC energy spectrum 

662 follows the atmospheric neutrino flux with a spectral index 7^3.7 predicted 



663 by Honda 



25|, including the contribution of prompt neutrinos from charm 



664 meson decays at higher energies (Naumov 26|]). The simulated background 

665 muon sample, which is used to estimate the purity of the data sample, was 
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666 produced using the air shower simulator CORSIKA. 

667 3.2.3. Choice of observables 

668 As a first step, the selection of energy-dependent observables is done. 

669 The inspection of the scatter and profile histograms led to the choice of the 

670 following three observables. 

671 • Number of DOMs which show at least one photoelectron. A muon 

672 with higher energy induces more Cherenkov light and has a higher 

673 track length and thus a higher probability to cause hits in DOMs. 

674 • Number of strings which contain at least one hit DOM. This observable 

675 provides additional directional information since the number of strings 

676 is correlated to the zenith angle of the track. Furthermore, the distances 

677 between DOMs on the same string are lower than between those on 

678 different strings. Thus, this observable is supplemental to the number 

679 of DOMs. 

680 • The direct track length of the muon in a certain time window {MPE- 

681 Fit_LDirC) . The length is calculated by the projection of the number 

682 of the direct (not scattered) photons on the reconstructed track as the 

683 distance between the two outermost points. 

684 The correlation and profile histograms in Fig. [T2] show the dependency be- 

685 tween the observables and the energy. In the TRUEE test mode, different 

686 binning of the three observables have been tried in order to find the optimal 

687 unfolding result with respect to the true distribution. 
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688 3.2.4- Results 

689 The best result obtained in test mode is shown in Fig. [131 The parameter 

690 set which dehvered the resulting spectrum consisting of 10 bins is given by 

691 16 knots and 5 degrees of freedom. This combination of parameters is also 
applied to the real data. 



0) i — 



10' r 

2 



i 0.5 h 
) 

! 
-0.5 



unfolded distribution 
true distribution 



4.5 5 5.5 6 

log10(energy/GeV) 



— 1 > 1 1 \ — 


1 1 


1 1 






, 1 


1 , 


1 




> 2.5 3 3.5 4 4.5 £ 


5.5 6 



Figure 13: Test mode result for the final unfolding settings. Shown are the true and the 
unfolded distribution of the part of the MC sample used for the determination of the 
response matrix. No acceptance correction is applied. The relative difference between the 
unfolded and true values is shown in the lower histogram. 

692 

693 The generated MC neutrino sample which is used to determine the de- 

694 tector response contains only simulated events which undergo an interaction 

695 within or close to the detector. This restriction is necessary to reduce sim- 

696 ulation time and memory. Therefore the generated function (here following 

697 ~ E""^) does not consider events, which do not cause any signal in the detec- 
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698 tor, and cannot be given to the unfolding algorithm to normalize the flux to 

699 the correct scale. However, since we use the individual event weights for the 

700 MC simulated neutrino sample to make it similar to the real data sample, 

701 the reweighted sample follows the atmospheric neutrino flux, calculated by 

702 Honda and Naumov. Thus this atmospheric neutrino flux function can be 

703 provided to TRUEE to describe the generated neutrino event distribution 

704 and make the full acceptance correction. We compare this method to the 

705 standard IceCube analysis procedure u sing the effective area to scale the fi- 

706 nal result to the original neutrino flux [27]. This method is described in the 

707 following. 

708 After passing all event selection steps, the final sample contains only 

709 a fraction of neutrino events. Thus, the unfolded distribution represents 

710 only neutrinos which interacted, triggered the detector and passed the event 
selection (Fig. [Tij) . 
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Figure 14: The unfolding of the IC 59 neutrino sample gives the distribution of selected 
neutrino events depending on energy. 



712 To calculate the neutrino flux for all neutrinos within the zenith angle 
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713 range, the unfolded spectrum has to be scaled with the effective area. This is 

714 the ratio between the observed event rate and the incoming flux and depends 

715 on the properties of the selected event sample and on the energy. It includes 

716 the muon neutrino cross section, the probability for the muon to be detected 

717 and the detector efficiency for muon detection and event reconstruction. The 
effective area for the current sample is shown in Fig. [151 It rises at higher 
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Figure 15: Effective area for the current neutrino sample dependent on neutrino energy. 
Illustrated are areas for different zenith angle ranges and for the average of the -whole 
zenith range of 88° to 180°, -which is considered in the analysis. 

718 

719 energies due to the increasing cross section of neutrinos and the longer tracks 

720 of neutrino-induced muons. For the events with vertically upgoing tracks the 

721 effective area decreases because of the rising probability for absorption of 

722 neutrinos within the Earth. 

723 In Fig. [161 an example of a neutrino flux spectrum is shown, which can 

724 be derived from an unfolded energy distribution of neutrino events (Fig. [Hj), 

725 if the effective area (Fig. [TB]) is known. Additionally, we present the result 

726 which has been obtained with the internal acceptance correction by providing 
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the function of the atmospheric neutrino flux to TRUEE (see also Fig. [TBI) . 
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Figure 16: Examples of an atmospheric neutrino energy spectrum gained from 10% of 
IC 59 data unfolded with TRUEE. The two spectra are obtained using different methods 
of acceptance correction: the standard IceCube method using effective area (black/solid) 
and the TRUEE internal acceptance correction (red/dot-dashed). The uncertainties are 
determined by the unfolding software using standard error propagation, while systcmatics 
are not considered in these results. The spectra are weighted with the square of the energy. 
The relative difference between both distributions is demonstrated in the lower histogram. 

727 

728 3.2.5. Verification 

729 To check the quality of the unfolding the agreement between the real data 

730 and the weighted Monte Carlo sample is investigated. Verification histograms 

731 of two observables which have not been used for the unfolding fit are shown 

732 in Fig. [17] and Fig. [H 
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Figure 17: Comparison of data (black/dot-dashed) and MC (red/solid) weighted to the 
unfolded function. Shown is the number of hits detected in ah DOMs per event. 
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Figure 18: Comparison of data (black/dot-dashed) and MC (red/solid) weighted to the 
unfolded function. Shown is the log-Likelihood value of an event reconstruction fit. 

733 3. 3. Tests on the influence of the simulation 

734 Generally, the unfolding permits the estimation of an unknown distribu- 

735 tion. With the determined response matrix, the unfolding should be able to 

736 identify any distribution in the data, independent from the distribution of 

737 the simulations which have been used for the determination of the response 

738 matrix. The only requirements are that the simulation model describes the 
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739 data well enough and that all bins of the observables are filled with a suffi- 

740 cient number of MC events. In general a ten times larger number of simulated 

741 events compared to data events is enough to neglect the uncertainties in the 

742 response matrix 28|. 



743 In some cases the unfolded distribution can have a very steep spectral 

744 slope. This is also true for astroparticle problems. To ensure that enough 

745 MC events are contained in the high energy region, the spectral slope of 

746 the simulated distribution should not deviate too much from the true data 

747 distribution. The following tests are made to investigate the impact of the de- 

748 viation in the spectral distributions between the MC sample and the pseudo 

749 data sample by using different spectral slopes in the simulation. We use dif- 

750 ferent toy MC samples for the calculation of the response matrix and unfold- 

751 ing which describe the following distributions with respect to the arbitrary 

752 variable x 

753 • power law with 7 = 3.7 (related to the atmospheric neutrino flux) 

754 • power law with 7 = 3.5 

755 • power law with 7 = 3.0 

756 • power law with 7 = 2.5. 

757 The different MC simulations which are used for the determination of the 

758 response matrix are shown in Fig. [111 

759 We present unfolding results of two pseudo data samples with the steepest 

760 (7 = 3.7) and the flattest (7 = 2.5) power law. The results are shown in 

761 Fig. [20] and Fig. [211 The slope of the pseudo data sample is the same or 
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Figure 19: Four toy MC samples used for the determination of the individual response 
matrices. The simulated distributions have different spectral slopes. 
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Figure 20: Unfolding results of the simulated pseudo data sample following the power 
law with 7 — 3.7 using different MC simulated distributions for the response matrix 
calculation. The true sought distribution is shown by the solid line. 

762 steeper than the slope of the MC samples. The maximum deviation between 

763 the spectral indices of MC and pseudo-data is 1.2. The unfolding results 

764 with the flatter MC assumptions are consistent with the true distribution 

765 within the uncertainties. The MC sample with the steepest spectral slope 

766 causes an underestimation of the event distribution at x-values greater than 

767 IoqiqIx) = 3.4. This is caused by the fact that the response matrix is not 

768 well enough described due to the low amount of events. Instead, the x-region 
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769 below logio{x) = 3.4 has bins containing more than 10 events and thus a 

770 good agreement of the unfolded result with the true distribution is ensured. 

771 The same is true for the unfolding of the power law distribution with 7 = 2.5. 
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Figure 21: Unfolding results of the simulated pseudo data sample following the power 
law with 7 = 2.5 using different MC simulated distributions for the response matrix 
calculation. The true sought distribution is shown by the solid line. 

772 The conclusion of the test is the recommendation to use a MC sample 

773 for the response matrix which features a similar or harder spectrum com- 

774 pared to the real data, especially if the unfolded distribution covers several 

775 orders of magnitude. The bins of the MC sought distribution should contain 

776 at least 10 events. In case of a completely unknown true distribution, an 

777 iterative approach of matching the MC spectral slope to the real data can be 

778 executed. 

779 4. Summary and outlook 

780 The new unfolding software TRUEE has been tested within the astropar- 

781 tide experiments MAGIC and IceCube and appears to be a very suitable tool 

782 for astroparticle physics, since it can properly estimate distributions which 
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783 cover several orders of magnitude. The input of all samples is event-wise, 

784 thus the response matrix is calculated with an individual binning for every 

785 distinct case. A moderate deviation of the distribution of simulated events, 

786 used to determine the response matrix, from the data distribution is tolera- 

787 ble, thus an a-priori knowledge of the exact spectral slope of the estimated 

788 distribution is not necessary. 

789 In TRUEE, the uncertainties are calculated in the same way as was done 

790 in 7UAM . They have been proven to follow the Poisson distribution or, in 



29|. There- 



791 the case of a large number of events, the Gaussian distribution 

792 fore the exclusion of values outside the uncertainties can be made with the 

793 minimal probability of 68%. An additional investigation to calculate con- 

794 fidence intervals is being developed within the collaborative research center 

795 SFB 823. Within the same project the time dependency of TRUEE will be 

796 implemented. Instead of time-slices, the unfolding will be able to deliver a 

797 two-dimensional distribution. This is suggested by the fact that a simple 

798 fragmentation of the data sample in several packages along the time axis is 

799 unacceptable to get reasonable results in cases of low statistics. 

800 Furthermore, an option which allows to automatically perform a sec- 

801 ond unfolding iteration with a re-weighted MC sample will be implemented. 

802 Thereby potentially large differences in the spectral distributions in MC and 

803 data can be avoided, which is especially important for the case of an accep- 

804 tance correction within TRUEE. While the procedure itself has been pre- 

805 sented and verified here, only its integration into the program is still to be 

806 performed. 

807 TRUEE has been developed in the programming language C-I--I-. It 
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808 contains the proven algorithm TZUAf and additional user-friendly functions, 

809 which offer a more comfortable handling of an unfolding analysis. The soft- 

810 ware is easy to install and convenient to use in combination with modern 

811 software. TRUEE and the original algorithm 7UAM deliver comparable re- 

812 suits. 

813 TRUEE is intended to be included in the common framework for unfold- 



814 ing software RooUnfold [30|. Additionally, TRUEE is currently tested by 

815 several particle physics groups. 

816 Within the collaborative research center SFB 823, the fields of application 

817 of the program TRUEE will be expanded to solving problems in the context 

818 of economics and engineering. 

819 The TRUEE software and the user manual can be found at http: / / app.tu- 

820 dortmund.de/TRUEE/. 
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